UPSTREAM PR #17030: ggml-cpu: handle 3d tensors in repack mat_mul#94
UPSTREAM PR #17030: ggml-cpu: handle 3d tensors in repack mat_mul#94
Conversation
|
Access the complete analysis in the LOCI Dashboard Performance Analysis Summary: PR #94 - 3D Tensor Support in Matrix MultiplicationOverviewPR #94 introduces 3D tensor support for batched matrix multiplication operations in the GGML CPU backend, specifically targeting models like LFM2 that require Key FindingsPerformance Impact:
Core Function Impact: Power Consumption Analysis:
Technical Analysis:
Implementation Details: Actionable Recommendations:
The modifications successfully enable batched matrix operations for advanced model architectures while introducing acceptable performance overhead in specialized code paths. |
eadb483 to
0b86651
Compare
|
Access the complete analysis in the LOCI Dashboard Performance Analysis Summary: PR #94 - 3D Tensor SupportOverviewPR #94 introduces 3D tensor support for repack matrix multiplication operations, specifically targeting models like LFM2 that require batched operations with Key FindingsPerformance Impact:
Core Function Impact: Power Consumption Analysis:
Technical Analysis:
Implementation Changes:
Scope Assessment: |
b1ace60 to
bff7103
Compare
733e776 to
2c7fec2
Compare
048ad94 to
6c1fde6
Compare
d4c3480 to
f998d1f
Compare
Mirrored from ggml-org/llama.cpp#17030
While testing #16739, perplexities for LFM2 skyrocketed. @ggerganov pointed out that some matrix shapes would probably not be supported.
LFM2 has some layers that have two batches, so MAT_MULs were only done partially, leading to incorrect results. See ggml-org/llama.cpp#16739 (comment)
This patch adds basic support for tensors with
ne2 > 1, with very naive chunking based on the non repack MUL MAT.Perplexities using this patch:
I can provide logs for other models if needed.